Goto

Collaborating Authors

 state observation









Robust Behavior Cloning Via Global Lipschitz Regularization

Wu, Shili, Jin, Yizhao, Niu, Puhua, Datta, Aniruddha, Andersson, Sean B.

arXiv.org Artificial Intelligence

Behavior Cloning (BC) is an effective imitation learning technique and has even been adopted in some safety-critical domains such as autonomous vehicles. BC trains a policy to mimic the behavior of an expert by using a dataset composed of only state-action pairs demonstrated by the expert, without any additional interaction with the environment. However, During deployment, the policy observations may contain measurement errors or adversarial disturbances. Since the observations may deviate from the true states, they can mislead the agent into making sub-optimal actions. In this work, we use a global Lipschitz regularization approach to enhance the robustness of the learned policy network. We then show that the resulting global Lipschitz property provides a robustness certificate to the policy with respect to different bounded norm perturbations. Then, we propose a way to construct a Lipschitz neural network that ensures the policy robustness. We empirically validate our theory across various environments in Gymnasium. Keywords: Robust Reinforcement Learning; Behavior Cloning; Lipschitz Neural Network


VB-Com: Learning Vision-Blind Composite Humanoid Locomotion Against Deficient Perception

Ren, Junli, Huang, Tao, Wang, Huayi, Wang, Zirui, Ben, Qingwei, Pang, Jiangmiao, Luo, Ping

arXiv.org Artificial Intelligence

The performance of legged locomotion is closely tied to the accuracy and comprehensiveness of state observations. Blind policies, which rely solely on proprioception, are considered highly robust due to the reliability of proprioceptive observations. However, these policies significantly limit locomotion speed and often require collisions with the terrain to adapt. In contrast, Vision policies allows the robot to plan motions in advance and respond proactively to unstructured terrains with an online perception module. However, perception is often compromised by noisy real-world environments, potential sensor failures, and the limitations of current simulations in presenting dynamic or deformable terrains. Humanoid robots, with high degrees of freedom and inherently unstable morphology, are particularly susceptible to misguidance from deficient perception, which can result in falls or termination on challenging dynamic terrains. To leverage the advantages of both vision and blind policies, we propose VB-Com, a composite framework that enables humanoid robots to determine when to rely on the vision policy and when to switch to the blind policy under perceptual deficiency. We demonstrate that VB-Com effectively enables humanoid robots to traverse challenging terrains and obstacles despite perception deficiencies caused by dynamic terrains or perceptual noise.


Review for NeurIPS paper: Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Neural Information Processing Systems

Clarity: *** Derivations in Section 3 *** While the theorems across Section 3.1 seem reasonable I would have liked some a more self-contained presentation of theorems together with proofs. Assumption 2 (Bounded adversary power) is a bit strange, and while the experimental implementation (with the norm ball around s) seems reasonable for many environments, this should probably be defined in a better way. The authors refer to the Appendix a lot and in my opinion such derivations are necessary for the reader to follow along. I cannot really follow how the authors get there. Add Plots (similar to Appendix I, Figure 12).